Three Non-Bayesian Methods of Spam Filtration: CRM114 at TREC 2007
نویسندگان
چکیده
For the TREC 2007 conference, the CRM114 team considered three nonBayesian methods of spam filtration in the CRM114 framework – an SVM based on the " hyperspace " feature==document paradigm, a bitentropy matcher, and substring compression based on LZ77. As a calibration yardstick, we used the welltested and widely used CRM114 OSB markov random field system (basically unchanged since 2003). The results show that the SVM has a spamfiltering accuracy of about a factor of two to three better accuracy than the OSB system, that substring compression is somewhat more accurate than OSB, and that bit entropy is somewhat less accurate for the TREC 2007 test sets.
منابع مشابه
Seven Hypothesis about Spam Filtering
For TREC 2006, the CRM114 team considered several different hypothesis on the topic of spam filtering. The hypothesis were that: 1 Spammers were changing tactics to successfully evade contentbased spam filters; 2 A pretrained database of known spam and nonspam improves overall accuracy; 3 Repeated training methods are more effective than singlepass Train Only Errors training 4 KNN/Hyperspace...
متن کاملCRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track
This paper discusses the design decisions underlying the CRM114 Discriminator software, how it can be configured as a spam filter, and what we may glean from the preliminary TREC 2005 results. Unlike most other filters, CRM114 is not a fixed-purpose antispam filter; rather, it’s a general purpose language meant to expedite the creation of text filters. The pluggable CRM114 architecture allows r...
متن کاملWorkload Characterization of Spam Email Filtering Systems
Email systems have suffered from degraded quality of service due to rampant spam, phishing and fraudulent emails. This is partly because the classification speed of email filtering systems falls far behind the requirements of email service providers. We are motivated to address this issue from the perspective of computer architecture support. In this paper, as the first step towards novel archi...
متن کاملBUPT at TREC 2006: Spam Track
This report summarizes our participation in the TREC 2006 spam track, in which we consider the use of Bayesian models for the spam filtering task. Firstly, our anti-spam filter, Kidult, is briefly introduced. And then we try to use weighted adjustment of separating hyperplane and selective classifiers ensemble to improve the filtering performance. Finally, we summarize the relevant results from...
متن کاملPRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection
Recently, the spam already constituted a serious problem for both e-mail users and Internet Service Providers (ISP). Solutions to the abuse of spam would be both technical and legal regulatory. This paper reports our solution for the TREC 2005 spam track, in which we consider the use of Naive Bayes spam filter for its desirable properties (simplicity, low time and memory requirements, etc.). Th...
متن کامل